Hany Hassan , Khalil Sima ’ an and Andy Way ◦
نویسندگان
چکیده
Until quite recently, extending Phrase-based Statistical Machine Translation (PBSMT) with syntactic knowledge caused system performance to deteriorate. The most recent successful enrichments of PBSMT with hierarchical structure either employ non-linguistically motivated syntax for capturing hierarchical reordering phenomena, or extend the phrase translation table with redundantly ambiguous syntactic structures over phrase pairs. In this work we present an extended, harmonised account of our previous work which showed that incorporating linguistically motivated lexical syntactic descriptions, called supertags, can yield significantly better PBSMT systems at insignificant extra computational cost. We describe a novel PBSMT model that integrates supertags into the target language model and the target side of the translation model. Two kinds of supertags are employed: those from Lexicalized TreeAdjoining Grammar and Combinatory Categorial Grammar. Despite the differences between the two sets of supertags, they give similar improvements. In addition to integrating the Markov supertagging approach in PBSMT, we explore the utility of a new surface grammaticality measure based on combinatory operators. We perform various experiments on the Arabic to English NIST 2005 test set addressing the issues of sparseness, scalability and the utility of system subcomponents. We show that even when the parallel training data grows very large, the supertagged system retains a relatively stable absolute performance advantage over the unadorned PBSMT system. Arguably, this hints at a performance gap that cannot be bridged by acquiring more phrase pairs. Our best result shows a relative improvement of 6.1% over a state-of-the-art PBSMT model, which compares favourably with the leading systems on the NIST 2005 task. We also demonstrate that the advantages of a supertag-based system carry over to German–English; especially when the negative effects of the brevity penalty are disregarded, improvements of up to 8.9% relative to the baseline system are observed.
منابع مشابه
A Syntactified Direct Translation Model with Linear-time Decoding
Recent syntactic extensions of statistical translation models work with a synchronous context-free or tree-substitution grammar extracted from an automatically parsed parallel corpus. The decoders accompanying these extensions typically exceed quadratic time complexity. This paper extends the Direct Translation Model 2 (DTM2) with syntax while maintaining linear-time decoding. We employ a linea...
متن کاملSupertagged Phrase-Based Statistical Machine Translation
Until quite recently, extending Phrase-based Statistical Machine Translation (PBSMT) with syntactic structure caused system performance to deteriorate. In this work we show that incorporating lexical syntactic descriptions in the form of supertags can yield significantly better PBSMT systems. We describe a novel PBSMT model that integrates supertags into the target language model and the target...
متن کاملLexicalized Semi-incremental Dependency Parsing
Even leaving aside concerns of cognitive plausibility, incremental parsing is appealing for applications such as speech recognition and machine translation because it could allow for incorporating syntactic features into the decoding process without blowing up the search space. Yet, incremental parsing is often associated with greedy parsing decisions and intolerable loss of accuracy. Would the...
متن کاملMatrex: the DCU machine translation system for IWSLT 2007
In this paper, we give a description of the machine translation system developed at DCU that was used for our second participation in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT 2007). In this participation, we focus on some new methods to improve system quality. Specifically, we try our word packing technique for different language pairs, we smoo...
متن کاملDesign and construction of a 4G mobile network antenna
.................................................................................................................................... 7
متن کامل